\section{Introduction}
\label{sect:intro}


%Virtualization is the engine behind many popular cloud computing platforms.
%Amazon, Alibaba,  and many others have provided public VM clouds that host
%tens of thousands of  virtual machines(VM) created for various active users.
Periodic  archiving of virtual machine (VM) snapshots is important 
for long-term data  retention and fault recovery. 
%in case of machine failures.  
For example, daily backup of VM images  is conducted automatically 
at Alibaba which provides the largest public cloud service in China.
The cost of frequent backup of VM snapshots is  high because of the huge storage demand.
This issue has been addressed by   storage data deduplication ~\cite{venti02,bottleneck08} that
identifies redundant content duplicates among snapshots.  
One architectural approach is to attach  a separate backup system with deduplication support such as ones
from  EMC~\cite{EMC} or NetApp{NetApp}
to the cloud cluster, and  the snaphosts of every virtual machine is   periodically transferred to
the attached backup system.  Such a dedicated backup  configuration can be expensive, 
considering that significant networking  and computing resource  is required 
to transfer undeduplicatd data and conduct signature comparison. 
Our work is targeted applications where data processing and archiving  can be conducted
in an inexpensive commodity storage cluster using a distributed file system
such as the Google file system or Hadoop~\cite{googlefs03,hdfs10}.
Resource sharing is common, especially in the cloud where many computing and storage
services are collocated in the same physical machines, so we 
expect the storage cluster to be non-dedicated.

With  resource sharing with other services as an assumption, we are motivated to develop a 
low-profile backup solution with deduplication. 
%Deduplication is a major


%This paper seeks for a low-cost architecture option and considers a backup
%service that uses the existing cloud computing resource.
%Since most of backup data is not used in practice, system resource in such a service is not fully utilized. 
Performing deduplication adds significant  memory cost for comparison of content fingerprints. 
Since each physical machine in a cluster  hosts many VMs, memory contention happens frequently. 
Cloud providers often wish that the backup service only consumes  small or modest resources 
%with a minimal impact to the existing cloud services.  Another challenge for backup with deduplication is 
with a minimal impact to the existing cloud services.  Another challenge is 
%that deletion of old snapshots compete for computing resource as well. That is because data dependence created 
that deletion of old snapshots compete for computing resource as well, because data dependence created 
by duplicate relationship among snapshots  adds processing complexity.
% especially when  VMs can migrate around in the cloud. 

%Among the three factors - time, cost and deduplication efficiency, one of them
%has to be compromised for the other two. For instance, if we were building a
%deduplication system that has a high rate of duplication detection and has a
%very fast response time, 
%it would need a lot of memory to hold fingerprint index and cache.  This leads to a compromise on cost. 
%Our objective is to lower the cost incurred while sustaining the highest de-duplication ratio
%and  a sufficient throughput in dealing with a large number of VM images.% This business model partially  resembles that of Amazon Glacier.

Among the three factors - time, cost, and deduplication efficiency, our main
objective is low cost, so as to not significantly impact primary services. By
sacrificing memory a faster deduplication system
can be built, but then machines must be dedicated to backup, which goes
against the converged architecture model that we are supporting. For daily
backup jobs, it is ok if the job takes 2-3 hours, as long as the resource
usage is minimal and it can be finished during lower usage hours of the day.

%Our solution  only uses a very small amount of memory space during backup while completing  snapshot backup and 
%deletion in a reasonable amount of time. 
The traditional approach to  deduplication is an inline approach which follows
a sequence of block reading, duplicate detection,  and non-duplicate  block write to the 
backup storage.  
Our key idea  is to  first perform parallel duplicate detection for VM content blocks 
among all machines before performing actual data backup. Each machine
accumulates detection requests and  then performs detection   partition by partition 
with minimal resource usage.
Fingerprint based partitioning allows highly parallel duplicate detection  and also simplifies 
reference counting management.  
% as the entire process can also be divided into a
%parition-wise  deletion. 
The tradeoff is that every machine has to read dirty segments twice
and that some deduplication requests are delayed for staged parallel processing.
Another tradeoff of our efficient batched approach, related to reading each
segment twice, is that we must maintain a consistent view of the disk between
the first and second read. We use Copy on Write (CoW) to acheive this, at the
cost of additional disk space.
With careful parallelism and buffer  management,
this multi-stage detection scheme can provide  a sufficient throughput for VM
backup. To minimize the CoW cost and adjust the balance between total backup
time and the backup times for individual VMs, we adopt a multiple-batch
approach based on the dual bin packing problem.


%Our study shows that common data blocks
%occupy significant amount of storage space while they only take
%a small amount of resources to deduplicate.
%Separating deduplication into multi levels effectively accomplish the major space saving goal
%compare the global complete deduplication scheme, at the same time it makes
%the backup of different VMs to be independent for better fault tolerance.

%Restricting the scope of global deduplication reduces
%the inter-component  data dependenc during machine failure.

%Several operations are provided for creating and managing snapshots and snapshot trees,
%such as create snapshots, revert to any snapshot, and remove snapshots.
%VM snapshots is different from traditional file system backup from a few aspects:
%\begin{enumerate}
%\item The backup loads are heavy but resources are limited. The whole VM cluster have PBs of data and need to be backuped at daily basis, if not more frequently. But at the same time snapshot tasks must not affect the normal VM operations, which means only a tiny slice of CPU and memory can be used for this purpose.
%\item No file system semantics are provided. Snapshot operations are taken place at the virtual device driver level, which means no fine-grained file system metadata can be used to determine the changed data. Only raw access information at disk block level are provided.
%\item Snapshots can be shared between users. This does not only complicates the parent-child relationship between snapshots, but also require all snapshots be managed in a global namespace.
%\end{enumerate}

%In this paper we intorduce the deduplication scheme of snapshot storage in Aliyun's VM cloud. We exploit the 
\comments{
The rest of the paper is arranged as follows. 
Section ~\ref{sect:background}
discusses on the background and related work.
Section ~\ref{sect:arch} presents our snapshot backup solution with request and index partitioning.
Section ~\ref{sect:analysis} summarizes  our analysis of the system performance.
Section ~\ref{sect:scheduling} presents our round scheduling algorithm based on our performance model.
Section ~\ref{sect:exper} presents  our evaluation results on the effectiveness of our system.
Section ~\ref{sect:final} concludes this paper.
}
% talks about Aliyun's VM snapshot system and deduplication process. In section 4 we present
%the snapshot deduplication results of our real user VMs.
